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(57) ABSTRACT 

In the natural production of human speech, the instant of 
closure of the vocal cords occurs usually at well defined 
instants. These instants are used for speech processing, such 
as glottal synchronous processing or speech synthesis with 
observed natural vocal cord excitation signals. To detect the 
instants of glottal closure from an observed speech signal, 
the observed speech signal is high pass filtered, and a 
temporally localized aggregate of the number and ampli- 
tudes of peaks in the high pass filtered signal is determined 
for possible instants of glottal closure. The instants of glottal 
closure are determined as instants where the aggregate takes 
maximal values, 

11 Claims, 4 Drawing Sheets 
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HUMAN SPEECH PROCESSING APPARATUS In this apparatus, the physical speech signal is first filtered 

FOR DETECTING INSTANTS OF GLOTTAL using a high pass or band pass filter which emphasizes 

CLOSURE frequencies well above the repetition rate of glottal closure. 

The filtering will emphasize the short term effects of glottal 

This is a continuation of application Ser. No. 08/557,370, 5 closure over longer term signal development which is due 

filed Nov. 13, 1995, which is a continuation of application mainly to ringing in the vocal tract after glottal closure. 

Ser. No. 07/948,186, filed Sep. 21, 1992 However, in itself the filtering usually will not give rise to 

a single peak, corresponding to the instant of glottal closure. 

BACKGROUND OF THE INVENTION 0 n the contrary, it will increase the relative contribution of 

The invention relates to a speech signal processing 10 noise peaks, and moreover the effect of glottal closure itself 

apparatus, comprising detecting means for selectively is °ft en distributed over several peaks, an effect which can 

detecting a sequence of time instants of glottal closure, by oe worsened by the occurrence of short term echoes, 

determining specific peaks of a time dependent intensity of We nave found tnal near tne instant of glottal closure, 

a speech signal. there wiU usually be a large peak or many small peaks, both 

Glottal closure, that is, closure of the vocal cords, usually 15 of which correspond to a large local signal density, i.e. 

occurs at sharply defined instants in the human speech aggregate peak number/amplitude count. Therefore, instead 

production process. Knowledge where such instants occur of staining only detection means for signal peaks, the 

can be used in many speech processing applications. For apparatus comprises averaging means which determine the 

example, in speech analysis, processing of the signal is often Si S nal ""tensity by averaging contributions from successive 

performed in successive time frames, each in the same fixed 20 widows of time instants. Consequently each instant of 

temporal relation to a respective instant of glottal closure. In S lottal closure ^ correspond to a single peak in the 

this way, the effect of glottal closure upon the signal is more physical intensity, and for example the instant when the peak 

or less independent of the time frame, and differences value is reached or the the center of the peak will have a time 

between frames will be largely due to the change in time of relation t0 me instant of S loUal closure which * independent 

the parameters of the vocal tract. In another application 25 of the delails of the S1 g nal - 

example, a train of glottal excitation signals is fed through In an embodiment of an apparatus according to the 

a synthetic filter modelling the vocal tract in order to invention, characterized, in that the filtering means are 

produce synthetic speech. To produce high quality speech, arranged for feeding the filtered signal to the averaging 

glottal excitations derived from physical speech are used to means via rectifying means, for rectifying the filtered signal, 

generate the glottal excitation signal. 30 trough value to value conversion, into a strength signal. By 

For such applications, it is desirable to identify the [f*^ * m f an ' tl ? e P rccess of obtaining a signal with a 

instants of glottal closure from physically received human ^ component which is responsive to me amplitude of an 

speech signals. An apparatus for finding these instants, or at AC ^f' m case ^ si ? n * ih «f . d from ih * ^ 

least instants which stand in fixed phase relation to these c sl S naL A sun P le exam P le ^ a /ectifymg value to value 

instants is known from U.S. Pat. No 3,940,565. According 35 conv ™ * conversion of filtered signal values to their 

to this publication, the instant of glottal closure is identified re ?f absol f ute vah f' ln &f* r * ^vosion in 

as an instant of maximum amplitude in the signal. To detect whic * values ° f °PP° s f do not consistently yield 

this, the received speech signal is fed to a peak detector, and exac |ly opposite converted values qualifies as rectifying, 

when the resulting peak signal is sufficiently large this P^vided values with successively larger amplitudes are 

detector triggers a flipflop to signal glottal closure. 40 C0 ™«f t0 # COnv f rted Values Wlth success fely larger 

.. j - , . 7 . . , . „ . amplitudes at least in some value raiige. Examples of 

TTie disadvantage of this method a that m not all speech KCti£yiag conversions in this sense are taking the exponen- 

signals glottal closure corresponds to the largest peak or tiaJ of , he si a]> of jts absolme va , ue of ^ 

even to a single peak. In voiced signals, there may be several combinations thereof 

peaks distributed over one period which may give rise to 45 One embodiment of the apparatus according to the inven- 

false detections. Also there may be several comparably large tion is characterized> M that the conversion comprises squar- 

peaks surrounding each instant of glottal closure, which m of values of a,,, mtered si , ,„ Ms * iy> DC 

gives nse to jitter in the detected instants as the maximum component of the strength signal, i.e. the physical intensity, 

jumps from one peak to another. Moreover in unvoiced represente ^ energy density of the signa i, which will give 

signals no instants of glottal closure are present, but there are 50 rfse tQ timal detectioQ ^ , he ^ a litudes ^ nor . 

many irregularly spaced peaks, which give nse to false maUy in me statistica , ^ 

e ection. j n aQ embodiment G f tne apparatus according to the 

SUMMARY OF THE INVENTION invention characterized, in that, in said averaging, the 

r , . . . , , strength signal is weighted in each of the windows, with 

It is an object of the invention to improve the robustness 55 wi hiin coefficients which remain constant as a fonction 

of glottal closure detection without requiring complex pro- of dme distance from a centre of me ViMow up tQ a 

cessing opera ons. predetermined distance, and from the predetermined dis- 

In an embodiment, the invention realizes the objective tance monotonously decrease to zero at the edge of the 

because it is characterized in that the apparatus includes window. A set of weighting coefficients which gradually 

a filter, for forming from the speech signal a filtered 60 decreases at the edges of the window mitigates the sudde- 

signal, through deemphasis of a spectral fraction below ness of the onset of contribution due to peaks in the filtered 

a predetermined frequency, the filter then feeds the signal; this makes the onset of peaks in the physical intensity 

filtered signal an less susceptible to individual peaks in the filtered signal if 

averaging mechanism which generates through averaging this contains several peaks for one instant of glottal closure, 

in successive lime windows, a time stream of averages 65 The precise temporal extent of the windows is not critical, 

representing said time dependent intensity of the However, if the windows are so wide as to encompass more 

speech signal. than one successive instant of glottal closure, there will be 
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contributions to the average which do not belong to a single feed an excitation input of a vocal tract simulator, for 

instant of glottal closure and a poorer signal to noise ratio forming a synthesized speech signal, 
will generally occur in the intensity. To avoid overlap of 

contributions from neighboring instants of glottal closure, BRIEF DESCRIPTION OF THE DRAWINGS 

the extent should be made shorter than the time interval 5 

between neighboring instants of glottal closure, which for For a filler understanding of the invention, reference is 

male voices is in the range of 8 to 10 msec and for female nad t0 me following description taken in connection with the 

voices is in the range of 4 to 5 msec. Too small an extent accompanying drawings, in which: 

incurs a risk of multiple detections, which is reduced as the FIG. 1 depicts a conventional model of speech production 

extent is increased. Depending on the quality of the physical 1Q FIG 2 shows an apparatus for frame by frame speech 

speech signal a minimum extent upward of 1 msec has been analysis 

found practical; an extent of 3 msec was a good tradeoff for FIG 3 shows a h si ^ aQ electro ^ ottal si aal and 

both male and female voices. , . , . , , • . . , 

, ,. - , , , . , three signals obtamed by processing the speech signal 
In one embodiment or the apparatus, characterized, m that 

it comprises width setting means, for setting a temporal FIG * 4 shows farmer examples of processing results 

width of the windows according to a pitch of the speech FIG. 5 also shows further examples of processing results 

signal. The width setting means use a prior estimate of the FIG. 6 shows additional examples of processing results; 

pitch, i.e. the interval between neighboring instants of glottal pic. 7 shows an exemplary detector according to the 

closure, to restrict the temporal extent of the window to invention for detecting instants of glottal closure by analysis 

below this interval. The prior estimate may be obtained in 2Q 0 f a S p eecn signal 

any one of several ways for example by feeding back an p, G g shows ^ results of a tion to 

average of the interval lengths between earher detected detect rf .^j c , osure 
instants of glottal closure, or using a separate pitch 

estimator, or by using a user control selector etcetera. Since GLOTTAL CLOSURE AND ITS DETECTION 

the most significant pitch differences are between male and 25 

female voices, a male/female voice selection button may be FIG. 1 depicts a conventional model for the physical 

used for selecting from one of two extents for the window. production of voiced human speech. According to this 

Accordingly, an embodiment of apparatus according to the model, the vocal cords 10 produce a locally periodic train of 

invention is characterized, in that the setting means are excitations, which is fed 12 through the vocal tract 14, which 

arranged for setting the temporal width to a first or second 3Q effects a linear filter operation upon the train of excitations, 

extent, the first extent lying between 1 and 5 milliseconds The repetition frequency of the excitations, the "pitch" of the 

and the second extent lying between 5 and 10 milliseconds. speech signal is usually in the range of 100 Hz to 250 Hz. 

In an embodiment of the apparatus according to the The train of excitations has a spectrum ofpeaks separated by 

invention characterized, in that the filtering means copy a intervals corresponding to this frequency, the amplitude of 

further spectral fraction of the speech signal above 1 kHz 35 the peaks varying slowly with frequency and disappearing 

substantially indiscriminately into the filtered signal. This only well into the kHz range. The linear filtering of the vocal 

makes the filtering means easy to implement. For example, tract on the other hand has a strong frequency dependence 

when the physical speech signal is a sampled signal, with 10 below 1 khz, often with pronounced peaks; especially at 

kilosamples per second, samples I n being identified by a lower frequencies the spectral shape of the speech signal at 

sample time index "n", the expression ^ the output 16 is therefore determined by the vocal tract. 

Physical excitations produced by the vocal cords 10, have 

vA.-°- 9/ ».-i been found to have well defined instants of so called glottal 

gives a satisfactory way of producing a filter signal s„. closure. These are periodic instants where the vocal cords 

The detection of the instants of glottal closure may be close, after which the vocal tract filter 14 is left to develop 

performed by locating locally maximal intensity values, or 45 the output signal by itself through ringing. Detection of these 

simply by detecting when the physical intensity crosses a instants of glottal closure is used for various purposes in 

threshold, or by measuring the centre position of peaks. In electronic speech processing. 

an embodiment of the apparatus according to the invention In one example of the use of these instants, speech is 

detection is accomplished by synthesized using an electronic equivalent of FIG. 1, with an 

determining an average DC content of the strength signal, 50 excitation generation circuit 10 followed by a linear filter. In 

averaged over a temporal extent wider than the width of order to produce high quality synthetic speech, the excita- 

the windows, then, tion generation circuit is arranged to generate a train of 

for determining whether the time dependent intensity excitations with natural irregularities; for this purpose 

exceeds the average DC content by more than a pre- observed instants of glottal closure are used, 

determined factor, excesses corresponding to the spe- 55 In another example, speech analysis, i.e. the decomposi- 

cific peaks. In this way, the thresholds are set automati- tion of speech, is performed on a frame by frame basis, a 

cally and are robust against variations in the nature of frame being a part the speech signal between two time 

the signal. When the predetermined factor is set suffi- points; the time points are synchronized by the instant of 

ciently high, unvoiced signals will not lead to detection glottal closure. FIG. 2 shows an example of a speech 

of any instants of glottal closure. 60 analysis apparatus that works on this principle. At the input 

In an embodiment of the apparatus according to the 20, the speech signal is received. It is processed in a 

invention characterized, in that the detection systems feed a processing circuit 21, which apart from the speech signal 

synchronization input of frame by frame speech analysis also receives a frame start signal 22, and an intra frame 

mechanism, for controlling positions of frames during position pointer 23. Processing by the processing circuit is 

analysis of the physical speech signal. 65 periodic, the period being reset by the reset input, and the 

In an embodiment of the apparatus according to the position within the period being determined from the posi- 

invention characterized, in that the detection mechanism tion pointer. The reset input is controlled by a glottal closure 
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detection circuit 24, which detects instants of glottal closure 
by analysis of the speech signal received at the input 20. The 
glottal closure detection circuit 24 also resets a counter 26, 
driven by a clock 25, which in the exemplary apparatus 
generates the intra frame pointer. One advantage of frame by 
frame processing is that there is a fixed relation between the 
phase of glottal excitation and the position in the frame, 
whereby many of the effects of excitation of the vocal cords 
are independent of the particular window considered. There- 
fore the signal variation between windows is dominated by 
the effects of the vocal tract. 

FIG. 3 shows an example of an electroglottal waveform 
32 obtained by electrophysiological measurement, the 
speech signal 30 produced from it, and the results 34, 36, 38 
of processing the speech signal. The electroglottal waveform 
32 has a very strong derivative at periodic instants (e.g. 33). 
These are the instants of glottal closure, and it is an object 
of the invention to determine these instants form the speech 
signal 30. As a first step in attaining the object, the speech 
signal 30 is converted into a filtered signal by linear high 
pass filtering. As the order in which linear operations are 
applied to a signal is immaterial for the result, one may 
consider the combined effect of the high pass filtering and 
the vocal tract filter 14 as the result of applying the vocal 
tract filter 14 to a high pass filtered version of the electro- 
glottal waveform. This version will have a constant value 
most of the time, with sharp peaks at instants of glottal 
closure 33. Between the peaks, the development of the high 
pass filtered speech signal is only determined by the vocal 
tract filter 14, which means that successive high pass filtered 
speech signal values should be linearly predictable from 
preceding values, with more or less time invariant prediction 
coefficients. 

At the peaks, this prediction will be incorrect. Detection 
of instants of glottal closure is attained by analyzing the 
amount of deviation that occurs in linear prediction. For this 
purpose, it is not necessary to determine the actual predic- 
tion coefficients; an analysis of the correlation matrix "R", of 
samples of the signal, is sufficient. This correlation matrix 
"R" is defined in terms of successive speech samples S ( - 

The matrix indices i j run over a predetermined range of "p" 
samples. The length of this range is called the order of the 
matrix, a reference for the position of the range in time is 
called the instant of analysis. The constant "m" is called the 
length of an analysis interval over which the correlation 
values are determined. When the speech samples "s" are 
linearly predictable from their predecessors, the matrix R 
will have at least one eigenvalue equal to zero. In general, 
all eigenvalues of R will be real and greater than or equal to 
zero, and when the speech samples "s" are not exactly 
linearly predictable, due to noise, or inaccuracies in the 
model presented in FIG. 1, the smallest eigenvalue of R will 
at least be near zero. 

One can use this property of the correlation matrix R to 
detect the amount of deviation from linear predictability, for 
example by evaluating the determinant (which is equal to the 
product of the eigenvalues, and will be small if the smallest 
eigenvalue is near zero), or, in another example, by deter- 
mining the smallest eigenvalue. The logarithm of the deter- 
minant 36 and the smallest eigenvalue 38 are displayed in 
FIG. 3 versus the instant in time at which they are deter- 
mined. They were determined by sampling the filtered 
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speech signal "I" at a rate of 10 kHz, subjecting the sampled 
values to the following high pass filter in order to obtain the 
filtered values "s" 

The analysis interval length in obtaining FIG. 3 was m«30 
samples and order of the matrix was p=l0. It can be seen that 
both the logarithm of the determinant 36 and the smallest 
eigenvalue 38 exhibit marked peaks at the instants of glottal 
10 closure, i.e. parts of the electroglottal waveform 32 with 
steep slopes. 

However, determination of either the determinant or the 
smallest eigenvalue of a matrix require a substantial amount 
of computation. We have found that a similar and at least as 

15 robust a detection of the instant of glottal closure can be 
attained by evaluating the sum of the diagonal elements of 
the correlation matrix R, i.e. its trace, which is equal to the 
sum of its eigenvalues; experiment has shown that all 
eigenvalues of the correlation matrix exhibit marked peaks 

20 near the instants of glottal closure. Evaluation of the trace, 
however, is a much simpler operation than either determin- 
ing the determinant of the smallest eigenvalue: it comes 
down to a weighted sum of the squares of the signal values, 
where the weight coefficients have a symmetrical trapezoi- 

25 dal shape as a function of time, the shape having a base 
width of m+p and a top width of m-p. 

The result of evaluating the trace of the correlation matrix 
is plotted versus the instant of analysis in the third curve 34 
of FIG. 3. It will be seen that this curve also exhibits marked 

30 peaks near the instants of glottal closure. Further examples 
of processing results are given in FIGS. 4, 5 and 6, which 
illustrate various speech signals 40, 50, 60, the result of 
evaluating the smallest eigenvalue 46, 56, 66, the logarithm 
of the determinant 48, 58, 68 and the trace of the correlation 

35 matrix 44, 54, 64 as a function of the instant of analysis. FIG. 
4 also contains the result 42 of filtering signal 40 with a high 
pass filter. One should note that in FIG. 3 the instant of 
glottal closure coincides with the maximum speech signal 
amplitude, and in FIG. 5 it coincides with maximum signal 

40 derivative. This is by no means always the case; in many 
speech signals there are several peaks in either the signal or 
its derivative or both, and the instant of glottal closure often 
does not coincide with these peaks; FIGS. 4 and 6 provide 
illustrations of this. In FIG. 6, the highest peaks have little 

45 or no high frequency content and do not give rise to larger 
detection signals 64. In FIG. 4, there are three peaks in the 
high pass filtered signal near each instant of glottal closure, 
and the maximum amplitude occurs variably either at the 
first second or third peak. It will be clear that mere maximum 

so detection in this case would lead to phase jitter in the 
detection of instants of glottal closure, whereas the trace 
signal 44 provides a robust detection signal. 

Hence, we have found that the trace of the correlation 
matrix is a computationally simple and robust way of 

55 marking instants of glottal closure. An exemplary apparatus 
detecting instant of glottal closure is shown in FIG. 7. Here 
the speech signal arriving at the input is filtered in a high 
pass filter 70, and then squared in the signal converter 72, 
subsequently, it is filtered with averaging means 74 which 

60 weights the signal in a window with a finite trapezoidally 
shaped impulse response (analysis of the expression for the 
correlation matrix shows that this is equivalent to trace 
determination). Preferably the extent of the impulse 
response should be less than the distance between successive 

65 instants of glottal closure. After the integrator 74, the signal 
is tresholded in a threshold detection circuit 76 which selects 
the largest output values as indicating glottal closure, but 
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with a time delay relative to the input speech signal due to amplitude from a central position of the window. The width 

the impulse delay of the averaging means 74. In the example of the window defines the delay time of the averaging means 

shown in FIG. 7, the threshold is fed to the thresholding 74; in general, the peaks at the output of the averaging means 

circuit via a further averaging circuit 58, which determines 74 will be delayed with respect to the instants of glottal 

the average converted signal amplitude over a wider interval 5 closure by an interval equal to half the window width, 

than the window of the averaging means 74. Finally, the extraction of the instants of glottal closure 

The output of the circuit is illustrated in FIG. 8, where the frora the integrator signal can also be varied. For example, 

output 80 of the averaging means 74 is shown, together with one mav a fixed threshold > or an average threshold as in 

the result 82 of further averaging 78, and thresholding with FIG " J*** the average may be multiplied I by a predeter- 

the further average 84 10 mmecl * actor m order lo make the threshold more or less 

The effectiveness of the apparatus shown in FIG. 7 can St ? n f nt * F^hermore instead of thresholding, one may 

. , j * j ... r * .i_ ,l i select maxima, i.e. instants of zero derivative, possibly m 

also be understood without reference to the mathematica com5ination ^ lhresholding . 

analysis expounded above. Near the instant of glottal Although the apparatus as described hereinbefore used 

closure, the excitation signal at the point 12 in FIG. 1 separate components, processing sampled signals, it will be 

contains strong high frequency components. By using the 15 dear that the invention is not limited to this: it can be apphed 

high pass filter 70, these components are emphasized. They equally well to continuous (non sampled signals), or the 

are then rectified by squaring them in the rectifier 72, and processing can be performed by a single computer executing 

their density, or signal energy, is measured in the averaging the several processing operations, 

means 74 which thus gets maximum output at the instant of What is claimed is: 

glottal closure. 20 1. An apparatus for processing a speech signal compris- 

From this understanding of the effect of the apparatus, a ing: 

number of variations in the apparatus which will leave it a filter for receiving said speech signal and for generating 

equally effective are readily derived. To begin with, the high a filtered speech signal by deemphasizing a spectral 

pass filter 70 may be replaced with any filter (like a band fraction of said speech signal below a predetermined 

pass filter) that selectively passes higher frequency compo- 25 frequency; 

nents which are chiefly attributed to the sharp variation of an averaging circuit coupled to said filter for receiving the 

the excitation signal near the instant of glottal closure. filtered speech signal and generating, through averag- 

Furthermore, the rectifier 72, which in the mathematical ing in successive time windows, a time stream of 

analysis used squaring of the signal may be replaced with average signal corresponding to time dependent inten- 

any nonlinear conversion, like for example taking power 30 s * tv °^ speech signal; and 

unequal to two or the exponent of the filtered signal. The a detector for selectively detecting a sequence of time 

only condition is that the nonlinear operation generates a DC instants of glottal closure by determining peaks of said 

bias from an AC signal, which grows as the AC amplitude „ |™ e dependent intensity of said speech signal, 

grows. A necessary and sufficient condition for this is that 2 ' apparatus of claim 1, further including a rectifier 

the nonlinear operation is not purely uneven (assigns oppo- 35 C0U P led M**™* s * d averaging circuit for 

site output values to opposite filtered signal values), and ^ectifymg said filtered speech signal received by the average 

* • , , .. & 7 , circuit, through a value to value conversion, the rectified 

grows with amplitude. The nonlinear conversion can be h sigaa f£ ein a st th signal. 

performed by performing actual calculation of a conversion 3 ^ apparatus as claimed in claim 2 , wherein said 

function (like squaring), but in many cases, a lookup table, rectifier the values of said filtered speech signal 

containing converted values for a series of input values can 40 4 apparatus as claimed in claim 3, wherein said 

be used. averaging circuit weights said strength signal in each of said 

The function of the averaging means 74 is to collect time windows with weighting coefficients which are con- 
contributions from around the instant of glottal closure, and stant as a function of time distance from a center of a 
to distinguish this collection from the contributions collected window to a predetermined distance and wherein said 
around other instants. For this purpose, it suffices that 45 weighting coefficients monotonously decrease from said 
averaging extends over less than the full distance between predetermined distance to an edge of said window, 
successive instants of glottal closure; the average may be 5. The apparatus as claimed in claim 2, wherein said 
weighted, most weight being given to instants close to the averaging circuit weights said strength signal in each of said 
instant under analysis. time windows with weighting coefficients which are con- 

The maximum extent of the window must be estimated in 50 stant as a function of time distance from a center of a 

advance. This can be done once and for all, by taking the window to a predetermined distance and wherein said 

minimum distance that occurs for normal voices, which is weighting coefficients monotonously decrease from said 

about 3 msec. Alternatively, one provide selection means 79 predetermined distance to an edge of said window, 

to adapt the integrator window length to the speaker, for 6, The apparatus as claimed in claim 1, further including 

example by using feedback from the observed distance 55 width setting means coupled to said averaging circuit for 

between instants of glottal closure, or using an independent setting a temporal width of one of said time windows 

pitch estimate (the pitch being the average frequency of dependent on a pitch of said speech signal, 

glottal closure). Another possibility is use of a male/female 7. The apparatus as claimed in claim 6, wherein said width 

switch button in the selection means 79, which allows the setting means sets the width of one of said time windows to 

user to select a filter extent corresponding either to typical 60 a time range selected from one of a first time range and a 

female voices (distance between instants of glottal closure second time range, said first time range including between 

above 4 msec) or to male voices (above 8 msec). about 1 millisecond and 5 milliseconds and said second time 

The trapezoidal shape of the weighting profile of the range including from between about 5 milliseconds and 10 

averaging means 74, which was derived using the trace of milliseconds. 

the correlation matrix, is not critical and variations in the 65 8. The apparatus as claimed in claim 1, wherein said filter 

profile are acceptable, provided it has weighting values copies a further spectral fraction of said speech signal above 

which substantially all have the same sign, and decrease in about 1 kHz into said filtered speech signal. 
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9. The apparatus as claimed in claim 2, further including 
a further averaging circuit for determining an average DC 
content of said strength signal, averaged over a temporal 
extent wider than the width of one of said windows and 
threshold means coupled to said further averaging circuit for 
determining whether said time dependent intensity of said 
speech signal exceeds the average DC content of said 
strength signal by more than a predetermined value. 



10 



10. The apparatus as claimed in claim 1, further including 
vocal tract simulation means coupled to said detection 
means for forming a synthesized speech signal. 

11. The apparatus as claimed in claim 1, further including 
selection means coupled to said averaging circuit for select- 
ing the temporal width of the time windows. 
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